fix(code_execution): handle model quirks with tool calls#6352
fix(code_execution): handle model quirks with tool calls#6352alexhancock merged 1 commit intoblock:mainfrom
Conversation
- Auto-prefix unprefixed tool names (read_module -> code_execution__read_module) - Parse stringified JSON arrays in search_modules terms parameter - Add note that all tool calls are synchronous (no async/await) - Improve search_modules documentation to clarify array parameter format Fixes issues where some models (e.g. Nemotron) strip tool prefixes or stringify array parameters. Signed-off-by: rabi <ramishra@redhat.com>
| - WRONG: Multiple execute_code calls, each with one tool | ||
| - RIGHT: One execute_code call with a script that calls all needed tools | ||
|
|
||
| IMPORTANT: All tool calls are SYNCHRONOUS. Do NOT use async/await. |
There was a problem hiding this comment.
hrm - I guess this is true - are we sure?
Also, I wonder if there is some other way to enforce this than prompting, but I think this is is ok @alexhancock
There was a problem hiding this comment.
I'm confused by this part of the code @rabi . Isn't the builtin function we provide async, and therefor don't any calls the model writes to functions which call a tool need to be awaited?
There was a problem hiding this comment.
It seems tools are exposed to the model as synchronous JavaScript functions via the Boa JS engine. create_tool_function wraps each tool using blocking_recv() which blocks until the async Rust handler responds. So, model-generated code I think should use const x = tool({...}) not await tool({...}).
There was a problem hiding this comment.
Makes sense @rabi. I had misremembered. Merged
| - Multiple terms: terms=["git", "shell"] (a JSON array, NOT a string) | ||
| - Regex patterns: terms="sh.*", regex=true | ||
|
|
||
| IMPORTANT: Do NOT stringify arrays. Use terms=["a","b"] not terms="[\"a\",\"b\"]" |
There was a problem hiding this comment.
have found saying NOT can use up attention and hurt performance, wonder if we need that so explicitly (if there is a way to feedback errors then that can help)
| tool_call: CallToolRequestParam, | ||
| cancellation_token: CancellationToken, | ||
| ) -> Result<ToolCallResult> { | ||
| // Some models strip the tool prefix, so auto-add it for known code_execution tools |
There was a problem hiding this comment.
is this special casing for code execution as it needs to prefix it, but in other cases - they work?
There was a problem hiding this comment.
I think code_execution tools (execute_code, read_module, search_modules) are the only tools the model calls directly when code_execution extension is enabled. Model is expected to call the tools with the extension prefix, but I suspect when all visible tools share the same prefix and the instruction is to use without prefix, some models strip them.
|
thanks @rabi - cc @alexhancock shoudl we do some benchmarks/checking with frontier models to make sure this is no worse (I am worryed about the use of "not" in the prompts as it consumes extra attention heads). |
michaelneale
left a comment
There was a problem hiding this comment.
seems a functional improvement, would like a +1 from @alexhancock though
alexhancock
left a comment
There was a problem hiding this comment.
LGTM other than the Q I asked
* main: (41 commits) Allow customizing the new line keybinding in the CLI (#5956) Ask for permission in the CLI (#6475) docs: add Ralph Loop tutorial for multi-model iterative development (#6455) Remove gitignore fallback from gooseignore docs (#6480) fix: clean up result recording for code mode (#6343) fix(code_execution): handle model quirks with tool calls (#6352) feat(ui): support prefersBorder option for MCP Apps (#6465) fixed line breaks (#6459) Use Intl.NumberFormat for token formatting in SessionsInsights (#6466) feat(ui): format large and small token counts for readability (#6449) fix: apply subrecipes when using slash commands (#6460) Fix: exclude platform_schedule_tool in CLI (#6442) Fix: Small update in how ML-based prompt injection determines final result (#6439) docs: remove SSE transport and rename to Streamable HTTP (#6319) fix: correct Cloudinary extension command and env variable (#6453) fix: add gap between buttons in MacDesktopInstallButtons.js (#6452) refactor: include hidden dotfiles folders in file picker search (#6315) upgraded safe npm packages (#6450) chore(deps): bump react-router and react-router-dom in /ui/desktop (#6408) chore(deps): bump lru from 0.12.5 to 0.16.3 (#6379) ...
* main: fix: require auth when running goose on non loopback address (#6478) chore(deps): bump hono from 4.11.3 to 4.11.4 in /ui/desktop (#6485) feat(cli): graceful fallback for keyring failures (#5808) fix: support global .gooseignore and negation patterns (#6157) docs: manual config for jetbrains (#6490) fix: Recipe slash command doesn't work with single optional parameter (#6235) fix(openrouter): Handle Gemini thoughtSignature for tool calls (#6370) docs: fix extensions page (#6484) Allow customizing the new line keybinding in the CLI (#5956) Ask for permission in the CLI (#6475) docs: add Ralph Loop tutorial for multi-model iterative development (#6455) Remove gitignore fallback from gooseignore docs (#6480) fix: clean up result recording for code mode (#6343) fix(code_execution): handle model quirks with tool calls (#6352) feat(ui): support prefersBorder option for MCP Apps (#6465) fixed line breaks (#6459) Use Intl.NumberFormat for token formatting in SessionsInsights (#6466) feat(ui): format large and small token counts for readability (#6449) fix: apply subrecipes when using slash commands (#6460)
Signed-off-by: rabi <ramishra@redhat.com> Signed-off-by: ThanhNguyxn <thanhnguyentuan2007@gmail.com>
…ased * 'main' of github.com:block/goose: fix(code_execution): serialize record_result output as JSON (#6495) perf(google): avoid accumulating thoughtSignatures across conversation history (#6462) fix(openai): make tool_call arguments optional and fix silent stream termination (#6309) fix: Improve error messages for invalid tool calls (#6483) fix: require auth when running goose on non loopback address (#6478) chore(deps): bump hono from 4.11.3 to 4.11.4 in /ui/desktop (#6485) feat(cli): graceful fallback for keyring failures (#5808) fix: support global .gooseignore and negation patterns (#6157) docs: manual config for jetbrains (#6490) fix: Recipe slash command doesn't work with single optional parameter (#6235) fix(openrouter): Handle Gemini thoughtSignature for tool calls (#6370) docs: fix extensions page (#6484) Allow customizing the new line keybinding in the CLI (#5956) Ask for permission in the CLI (#6475) docs: add Ralph Loop tutorial for multi-model iterative development (#6455) Remove gitignore fallback from gooseignore docs (#6480) fix: clean up result recording for code mode (#6343) fix(code_execution): handle model quirks with tool calls (#6352) feat(ui): support prefersBorder option for MCP Apps (#6465) fixed line breaks (#6459)
Signed-off-by: rabi <ramishra@redhat.com> Signed-off-by: fbalicchia <fbalicchia@cuebiq.com>
Summary
Fixes issues where some models (e.g. Nemotron) strip tool prefixes or stringify array parameters.
Type of Change
AI Assistance
Testing
Tested with vLLM and Nemotron Hybrid MoE Model (nemotron-3-nano-30b-a3b)